Protein fold recognition by total alignment probability.

نویسندگان

  • J R Bienkowska
  • L Yu
  • S Zarakhovich
  • R G Rogers
  • T F Smith
چکیده

We present a protein fold-recognition method that uses a comprehensive statistical interpretation of structural Hidden Markov Models (HMMs). The structure/fold recognition is done by summing the probabilities of all sequence-to-structure alignments. The optimal alignment can be defined as the most probable, but suboptimal alignments may have comparable probabilities. These suboptimal alignments can be interpreted as optimal alignments to the "other" structures from the ensemble or optimal alignments under minor fluctuations in the scoring function. Summing probabilities for all alignments gives a complete estimate of sequence-model compatibility. In the case of HMMs that produce a sequence, this reflects the fact that due to our indifference to exactly how the HMM produced the sequence, we should sum over all possibilities. We have built a set of structural HMMs for 188 protein structures and have compared two methods for identifying the structure compatible with a sequence: by the optimal alignment probability and by the total probability. Fold recognition by total probability was 40% more accurate than fold recognition by the optimal alignment probability. Proteins 2000;40:451-462.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Do aligned sequences share the same fold?

Sequence comparison remains a powerful tool to assess the structural relatedness of two proteins. To develop a sensitive sequence-based procedure for fold recognition, we performed an exhaustive global alignment (with zero end gap penalties) between sequences of protein domains with known three-dimensional folds. The subset of 1.3 million alignments between sequences of structurally unrelated d...

متن کامل

Combining Secondary Structure Element Alignment and Profile-Profile Alignment for Fold Recognition

One of the most intensely studied problems of bioinformatics is the prediction of a protein structure from an amino acid sequence. In fold recognition, one reduces this problem to assigning a protein of unknown structure to one of the known fold classes as defined in the SCOP or CATH classifications. Here, we combine two alignment methods, secondary structure element alignment and log average p...

متن کامل

E ect of Secondary Structure Prediction on Protein Fold Recognition and Database Search

Hydrophobic long-range interactions and local polypeptide chain propensities are the major factors directing protein folding. Incorporating both these terms in addition to the Dayho matrix helps us to increase quality of protein fold recognition via sequencestructure alignment. We have shown that the results of secondary structure prediction substantially increase a sensitivity of the fold reco...

متن کامل

Protein fold recognition by alignment of amino acid residues using kernelized dynamic time warping.

In protein fold recognition, a protein is classified into one of its folds. The recognition of a protein fold can be done by employing feature extraction methods to extract relevant information from protein sequences and then by using a classifier to accurately recognize novel protein sequences. In the past, several feature extraction methods have been developed but with limited recognition acc...

متن کامل

Competitive assessment of protein fold recognition and alignment accuracy.

The predictions made for fold recognition and modeling accuracy at the 1996 Critical Assessment of Structure Prediction meeting (CASP2) were assessed to discover which groups did best. With 32 groups making a total of 369 predictions, it was necessary to use simple criteria for distinguishing between the entries. By focusing on the predictors' ability to use the sequence of the unknown target s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proteins

دوره 40 3  شماره 

صفحات  -

تاریخ انتشار 2000